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Merriam-Webster's Collegiate Dictionary 
TENTH ED8TION 

Excerpt from page 1228 



thought-out \- f aut\ adj (1870) : produced or arrived at through men- 
tal effort and esp. through careful and thorough consideration 

though t« way V t wa\ n (ca. 1944) : a way of thinking that is characteris- 
tic of a particular group, time, or culture . rwIJ . ^ c 

thousand \ l thau-z a n(d)\ n, p/ thousands or thousand [ME, fr. OE 
thusend; akin to OHO dusunt thousand, Lith tukstantis, and prob- to 
Skt iavas strong, L tumere to swell — more at THUMB] (bef. 12c) 1 — 
see number table 2 ; a very large number <~s of ants> — thousand 
adj — thou-sa»d*fold \-z*n(dT-.f6Id\ adj or adv — thousandth 

Thousand Island dressing n [Thousand Islands, islands in the St. 

Lawrence- River] (1924) ; mayonnaise with chili sauce and seasonings 

(as chopped pimientos, green peppers, and onion) 
thousand-ledger \ g tlmu-^n<clMe-g3r, -'laA n (1914) : milupede 
thousands place n (1937) : the place four to the left of the decimal 

point in a number expressed in the Arabic system of writing numbers 
Thra-cian X'thra-shanX n (1569) 1 : a native or inhabitant of Thrace 

2 : the Indo-European language of the ancient Thracdans — : see indcm 

European languages table — Thracian adj 

'thrall \'throl\ n [ME thral fr. 0£ tfrrael, fr. ON thradQ (bef. 12c) 1 a 
: a servant slave : BONDMAN; also i serf b : a person in moral or men- 
tal servitude 2 a : a state of se^tude or submission <m ~ tQ his 
emotions) b : a state of complete^absorption <mountains could hold 
me in ^ with a subtle attraction of their own — Elyne Mitchell) — 
thrall adj — thrall-dom or thral*dom Vthrol-dsnA n 

2thrall vt(\2>c) archaic \ enthrall, enslave 
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. new technology ^ 

The use of a genetic map of biallelic 
markers in linkage studies 

Leonid Kruglyak 

Improvements in genetic mapping techniques have driven recent progress in human genetics. The use of single 
nucleotide polymorphisms (SNPs) as biallelic genetic markers offers the promise of rapid, highly automated 
genotyping. As maps of SNrVand the techniques for genotyping them are being developed, it Is Important to consider 
what properties such maps must have in order for them to be useful for linkage studies. I examine how polymorphic 
and densely spaced biallelic markers need to be for extraction of most of the inheritance information from human 
pedigrees, and compare maps of biallelics with today's genome-scanning sets of microsatellite markers. I conclude that 
a map of 700-900 moderately polymorphic biallelic markers is equivalent — and a map of 1,500-3,000 superior— to the 
current 30CM00 microsatellite marker sets. 



The revolution in human genetics dim has unfolded over the past 
decade and a half has been driven largely by the development of 
genetic maps. The original concept was proposed by Bolsteiu ct til., 
with restriction fragment length polymorphisms (KI'M's J as mark- 
ers 1 . The first human lU'Ll' was quickly identified 1 , and I hmting- 
ton's disease soon became the 1 1 est autusumal disorder linked lo an 
anonymous DNA murker 3 . The first R1U.P map of the human 
genome followed shortly 4 . KFLPs were based on a variety of poly- 
morphisms at the sequence level (single nucleotide changes inser- 
tions and deletions, repeal length polymorphisms) and were assayed 
by Southern hybridization. Although a great advance, Kl'XPs were 
often not very polymorphic, and they were costly in id time-con- 
suming to develop and assay in large numbers. Nevertheless, these 
markers made human molecular genetics a reality and led to the 
mapping of a number of important mendelian diseases. 

The next major advance came 
with the discovery and develop- 
ment of niicrosutellites (STUs or 
SSLPs) as markers*. These loci 
are abundant, have fairly high 
polymorphism rates and can be 
assayed by PCU, leading to 
lower cost and a greater degree 
of uuloiimtioii. Dense maps of 
mtcrusatcllites are now avail- 
able 6,7 , allowing simple men- 
delian diseases to be mapped 
with relative ease and enabling 
first searches for genetic causes 
of complex diseases by genome 
scan. However, the require- 
ments to assay the loci on gels 
and to distinguish several 
length-based alleles make it 
hard to fully automate the geno- 
typing process, and typing large 
numbers of individuals for 
markers coveriug the genome 
remains beyond die resources of 
ull hut a few labs. There is thus 
u need to move beyond this cur- 
rent technology. 



4- 



1 - 



A*. 



— T" 
0.25 



Fig. 1 Expected tod score (ELOD) for a dominant locus b plotted against informa- 
tion content. Each circle rcpiesents the insults of o tiinulolion for ono ol 130 mupv, 
as described in Methods. The solid Hik: *how> On? *xp«*u?d linear uji relation If 
inlomitfttou cmiLait of o corresponds to an ELOO of 0 and Information content 
of I corresponds to the maximum achievable EIO0 of 6,02 in rhe*c pedigrco. 



Rect i i I intention has focused on the use nl' single nucleotide 
polymorphisms (SNPs) as genetic markers. At first glance, this 
m;iy appear to represent a step back to the days of low polymor- 
phism rates characteristic of Ili'LPs. However, modern technolo- 
gy should allow efficient assays of SNPs in numbers sufficiently 
large to offset their lower polymorphism rates, as discussed below. 
SNPs offer a number of important advantages over microsatel- 
lites. They are highly abundant, with classic estimates of more 
than I per 1,1)00 base pairs, or more than 3 million in the 
genome*^, lb date, more than 1,000 PCR-amplifiabic SNF mark- 
ers have been discovered and mapped (U. Wang, pers. comm.). 
because SNPs have only two (common) alleles (hexiee the term 
'biallelics'), genotyping them requires only a plus/minus assay 
rather than a length measurement, permitting easier automation. 
Several nun -gel -based assays have been proposed 10 " 14 , with high- 
density oligonucleotide arrays 
fc currently showing great promise 
for typing large numbers of 
biallelic markers hi parallel 19 ' 16 . 

Here 1 consider the feasibility 
of carrying out linkage studies 
with a genetic map based on bial- 
lelic markers. The key questions 
are; What level of polymorphism 
is required? and How many 
markers adequately cover the 
genuine? These questions ure 
addressed below. 



Assumptions 

The effects of marker density 
and polymorphism were exam- 
ined by simulating pedigree 
genotype data and measuring 
the inlbrmation content 1 7,1 a for 
a broad range of map densities 
and polymorphism levels (see* 
Methods for simulation details). 
Information content measures 
the fraction of inheritance 
information extracted by the 
map relative to that which 
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2 Information content for five map densities is plotted against the ft t^utucy 
qf 0i* more common of the two allele of a blallelic marker. The circles show 
actual simulation data point*. 



Table 1 • Information content for bia lie lies 



Table 2 •Information content for mlcrosateUltes 
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0.94 


0.80 


0,84 


0.87 
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allele distribution 
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0.56 


0.42 
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0.26 
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0.15 



would be extracted by an infinitely dense polymorphic map. Thus, 
an information content of 1 reflects complete information, where- 
as an information content of 0 reflects no information. Informa- 
tion content incorporates both marker density and polymorphism 
in a single general measure of map quality that is independent ol 
assumptions about a particular disease locus. It also closely pre- 
dicts the power of a map to detect linkage— for example, as mea- 
sured by the expected lod score (FXOD; Fig. 1 ). 

The markers were assumed to be evenly spaced, and informa- 
tion content was measured at a location halfway between two mark- 
ers, where it is expected to be lowest. l : or clarity, a single pedigree 
structure is used throughout: first-cousin pairs with parents but 
not grandparents available Air genotyping. Kxtensive simulations 
show that although the absolute numbers differ somewhat for other 
pedigree structures, all the main conclusions about the relative 
importance of marker polymorphism and density continue to hold. 

How polymorphic do biallelic markers need to be? 

Biallelic markers vary in their rales of polymorph ism: the more 
common allele can range in frequency from 50% to nearly 100%. 
In considering a map of biallelic markers, it is important to ask 
whether only near-perfect (50-50) biallelics are useful or whether 
less polymorphic markers can provide comparable amount* of 
information. To ansvyer this question, I measured information con- 



lent in simulations of maps of biallelic markers with varying degrees 
of polymorphism. 

The results (l-ig. 2, Table J ) dearly indicate that at higher map 
densities, allele frequency has only a small effect on information 
content in the range of frequency distribution* from 50- 50 to 80-20. 
Specifically, a UcM map of 60-10 biallelic* provides an informa- 
tion content nfiUUt, essentially the same as perfcLi 50-50 biallelics 
at this density, while 70-30 blallellcs provide on information con- 
tent of 0.87, and 8U-20 biallelics provide an information content of 
0.84. The information content drops to 0.73 for VU-iO biallelics. 
Thus, the use of biallelic markers with frequency distribution as 
skewed as 80-20 leads Lo little reduction in the Information content 
of a dense map. For sparser maps of 5-10 cM, a similar conclusion 
holds for marker allele frequency distributions as skewed a* 70-30. 

How dense does a map of biallelic markers need to be? 

Although there is a limit on how polymorphic a biallelic marker 
can be (a 50-5U distribution of the two alleles), there is essentially 
no theoretical limit on map density (or marker number), as rea- 
sonably polymorphic SNr*s can be found roughly every 1 kb, or 
about > million times in the human genome (see above). Thus, 
one answer to how many markers are needed is that more is always 
belter 1 , for common linkage study designs, however, the addition 
of markers provides diminishing returns once most of the inheri- 
tance information has been extracted. As shown above, a 1-cM 
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Fki. 3 Information contam '« plotted against marker spacing few selected micfaiatel- 
Hte (heterozygosity H=0.75) and biallelic (heterozygosity HwOJi) markers. Arrows 
conned it* points on t»«r two curve* yvrwe Information content reachc* 0.75 (top) 
and 0.54 tbottom), the values for 5<M and 10-cM microsatelltte maps, respectively. 



nature genetics volume 17 September 1997 



P2 39Vd 
njogeBed SO/LZ/U 



XI13H M3(n09 



£09123990* SP'£2 900S/8I/TI 
ejBMiig euueAetio BmuuiAA PJBmv etu Aq lues 

OldSn :oi 



80-Vi:(ss-uiui) NOIIVWK) . EZES699ESZi:aiSD . 00E8C2Z:SINO , SZ/9-dHXd3-01dSn «AS . [aui|l pjepuejs ujajsea] t/n ZSWZ 900UIZJV I IV OAOM . mt 39Vd 

new technology 



map of 50-5U hiallelic markers ex I t acts K«% of the available infor- 
mation, and ii is unlikely tlint higher informalum content is need- 
ed in an initial tiereeu fur linkage. What is I J to inform a tiomd cost 
of decreasing ilie density of tin- map? Siniuialion results (Fig. 2) 
show thai map density plays u more critical rule than marker poly- 
morphism, A 2-cM mop provides inforiiKUiun content < it 0.75, ,i 
3-cM map U.65 and a 5-cM map 0.50. Together with the results of 
the previous section, these numbers lead to the conclusion that lor 
initial linkage studies it is desirable to screen a dense ( 1-3-cM ) map 
of moderately polymorphic (30-50 to 80-20) hiallelic markers. 
Interesting rcgiojui Can then he followed up with all available I hial- 
lelic mid uiiciosalellile) markers. 

It is worth noting that there urc^vo separate issues regardin>* 
map density: how many markers exist and how many markers can 
he yenolyped rapidly and cost -effectively. Although current 
mierosatellite maps cover the genome *u an average spacing of lew 
than I eM (with more than 5,W)U markers in the lliial iiOnelhon 
map alone 7 ), j^eiMUypine, more than a Jew hundred markers in a 
la rye collection of families remains heyoml the power of today's 
technology and research budgets. 'J'hns, the practical limit oo the 
uumher of hiallelic markers will depend on the (eclmupies lor 
marker development and ^enotypin^. Nonetheless, tl is interest- 
ing to compare such maps with eurreiil maps of mierosalellite 
markers. Such a comparison i* carried out in the next section. 

Comparison of maps based on biallelics and rnicrosatellites 

Current genome scans typically employ a Itl-cM map of mierosalel - 
lite markers for the initial screen 1 *" 311 , followed hy denser coverage of 
regions that yield inleresliuu, results* (Although one could employ a 
'singed search' strategy of starling with a sparser 2IH«-cM map 
and then increasing the density in all moderately positive 
regions 11 -", economies of scale in large yeuotypiiij; lahs usually 
argue fur a uue-singe Initial scan: using u single oplimi/.ed sei of 
markers for all projects is mure efficient than Milling in* different 
reruns for each.) Micrusalcllilc markers typically vary IxMwecn 0.W 
and o.a in heterozygosity (for Instance, an average of 0.7 in the final 
Gcnclhon map 7 ), and for simplicity I will use mierosatelliles willi 
four equally frequent alleles f heterozygosity of 11.75) as representa- 
tive in the following comparison* with hiallelics with two cuually 
frequent alleles (heterozygosity of 0.5); results lor other values are 
yiven in Tables 1 and 2. Intuitively, one would expect ivvo closely 
Jinked biallelics lo provide the same information us one mierosulcl 
lite, and simulations largely confirm this intuition. A 10-cM map 
of mierosatelliles achieves information content of 11.54 (Tin- 3). Il»e 
Kiuie information content is provided by a '1.5-eM map of hiallelic 
markers. A denser 5-cM mierosalellite map achieves an iufornia- 
lion content of 0.75, as docs a 2-eM map of hiallelics. In general, 
maps of hiallelic markers at about 2.23-2.3 limes the density of 
tntcrosacelliies provide a com para hie information content. A Iti- 
cM map of 300 mierosalellite markers can therefore he replaced hy 
a 4-cM map of 750 hiallelic markers. These conclusions are in rough 
agreement with the results of an earlier study of the trade-off 
between marker spacing and polymorphism 5 **. 

As technology improves, it is likely that screening a much denser 
map of hiallelic markers will he cheaper and easier Hum carrying 
imt today's genome scans employing iiiicrosatel!iles ,5,l< \ There mv 
reasons U> employ such denser maps. As shown above* current scan 
densities lead ui considerable loss of information. This problem is 
more serious for dala-sets consisting of more distantly related nlVect- 
eds or of progeny of consanguineous marriages used in homozy- 
gosity mapping 24 . It is therefore worth noting thai a I -cM map of 
biallelics (about 3,000 markers) yields much higher information 
content than a ItJ-cM map of mierosatelliles (O.HK vs. 0.54). and is 
superior to a 5-cM uiicrosateJlitc map (0.88 vs. 11.75). 



Practical linkage analysis using biafielic markers 

Hcoiukc nf llu.' lower polymorphism rates of bialldic markers, it is 
critical to Consider many linked markers uimuilaueuuKly; indeed, 
all the above results assume complete multipoint analysis of all 
markers on a chromosome. Such multipoint unalysis is even more 
important for hiallelics than for rnicrosatellites. Forlunuldy, recent- 
ly developed algorithms and software allow multipoint analysis wilh 
an essentially unlimited number of linked markers to be carried out 
for sib pairs' 7 as well as for general pedigrees of moderate size**. 
These me I hods can also be used for automatic haplotype recon- 
struction, avoiding the tedious prospect of haplutyping many bial- 
lelics hy hand. The one remaining challenge is extending multipoint 
analysis with many markers to large multi-generational families, 
although even here the situation is improving 2 *. 

Discussion 

The results presented here clearly demonstrate that the use of a 
genet i i- map uf hfctllcMc markers for linkage studies is feasible on 
theoretical grounds. Jl is not necessary to find only 'perfect' 50-50 
hiallelics: markers wilh allele frequency distributions as skewed as 
70-30 or even 8(1-20 are almost as useful in a dense map. This result 
should allay lite concern that markers discovered in one population 
may nut he sufficiently informative in other populations with dif- 
ferent allele frequencies. A 1-2-cM map of moderately polymor- 
phic hiallelic markers is Superior to today's mierosalellite screening 
sets for extracting inheritance information and should provide a 
more efficient tool for initial genome scans. 

liven denser maps should enable novel study designs for dissect- 
ing genetically complex pheuutypes. In particular, genome scans 
lor linkage disequilibrium (l.l)J and association may become prac- 
tical* 2S . because 1.1) mapping relies on detecting recom hi nation- 
ally conserved regions around an ancestral mutation, the required 
map density will vary with the age and history of the study popu- 
lation, with very dense maps (spacing of 10 kb or less) likely to be 
needed lor LI J scans in a mixed general population. A more promis- 
ing approach may be to screen in parallel functional (coding) hial- 
lelic polymorphisms in many genes for direct association (rather 
than l.m wilh disease*' M . 

Maps ol hiallelic markers and the technology to genotype diem 
should be forlhcomhig ,s - ,r \ and the resulting progress in human 
genetics will he exciting in watch. 

Methods 

Simulations. Sepvuaitmi eh ft ii immune* nf lun-cM length willi evenly 
spaced markers was simulated, hir hiallelics, llic frequencies nf the commciu 
iilk-le wore 0.5. (1.6, 0.7, (I.H, <)», 0.95 and U.VY. FW miernsatcllitiai, equally fir- 
quent alleles were assumed, wilh allele numbers of 3, 4> 5, IU, 2U and 100. 
Marker --.'paring* ol* ), 2, . . . , UJ eM were examined. Each simulation consist- 
ed i if |0U replicates of 10 cousin pairs each. Information cojiienl was com- 
puted with (JiiNUIIUNTHk'*. Information content was nieusured halfway 
between the two markers clusest to the middle of the chromosome. For 
liljOP computation, a duniinaiit disease locus with full penetrance, no phc- 
ntiCopies and allele frequency of 0.1X1 1 was assumed to lie halfway between 
jtvo markers, and chmmuStunc* were simulated assuming that both cousins 
ivcre affected. UGNKMUN^'KR was mud to compute multipoint tad scores, 
it'he relationship between himrmatiou content and P.IX")D is preserved for 
9 'iher assumption* about iImt disease locu* (data not shown). Simulation soft- 
ware used t<> generate ihc data is available from the aulhor and Can be used 
lo explore additional map properties and pedigree structures. 
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ABSTRACT Tfae analysis of DNA for the presence of 
particular mutations or polymorphisms can be readily accom- 
plished by deferential hybridization with sequence-specific 
oligonucleotide probes. The in vitro DNA amplication tech- 
nique, the polymerase chain reaction (PCR), has facilitated the 
use of these probes by greatly increasing the number of copies 
of target DNA in the sample prior to hybridization. In a 
conventional assay with immobilized PCR product and labeled 
oligonucleotide probes, each probe requires a separate hybrid- 
ization. Here we describe a method by which one .can simul- . 
tancously screen a sample for all known allelic variants at an 
amplified locus. In this format, the oligonucleotides are given 
homopolymer tails with terminal deoxydbonucleotldyltrans- 
feraseVspotted onto a nylon membrane, and cb violently bound 
by UV irradiation. Due to their long length, the tails are 
preferentially bound to the nylon, leaving the oligonucleotide 
probe. free to hybridize. The target segment of the DNA. sample 
to be tested Is PCR-ampIiflcd with biotlnylated primers and 
then hybridized to the membrane containing the immobilized 
oligonucleotides under stringent conditions; Hybridization is 
detected nonradioacdveiy by binding of streptaviduvhorserad- 
ish peroxidase to the biotlnylated DNA, followed by a simple 
coloriraetric reaction. This technique has been applied to HLA- 
DQA genotyping (six types) and to the detection of Mediterra- 
nean ^-thalassemia mutations (nine alleles). 

Differential hybridization with sequence- specific oligonucle- 
otide probes has become a widely used technique for the 

detection of genetic mutations and polymorphisms (1-5). 

' When hybridized under the appropriate conditions, these 
synthetic DNA probes (usually 15-20 bases in length) will 
anneal to their complementary target sequences in the sample 
DNA only if they are perfectly matched. In most cases, the 
destabilizing effect of a single base-pair mismatch is sufficient 
to prevent the formation or a stable probe-target duplex (6). 
With an appropriate selection of oligonucleotide probes, the 
relevant genetic content of a DN A sample can be completely 
described. 

This very powerful method of DNA analysis has been 
greatly simplified by the in vitro DNA-amplification tech- 
nique, the polymerase chain reaction (PCR) (7-9). The PCR 
can selectively increase the number of copies of a particular 
DNA segment in a sample by many orders of magnitude. As 
a result of this 1Q 6 - to 10*-fold amplification, more convenient 
assays and nonradioactive detection methods have become 
possible (10-12). These PCR-based assays are usually done 
by amplifying the target segment in the sample to be tested, 
fixing the amplified DNA onto a series of nylon membranes, 
and hybridizing each membrane with one of the labeled 
oligonucleotide probes under stringent hybridization condi- 
tions. However, each probe must still be individually hybrid- 

The publication costs uf this article were defrayed in part by puge charge 
payment. Thii article must therefore be hereby marked " advertisement" 
in accordance with 18 U.S.C. §17J4 solely to indicate this fact. 



ized to the amplified DNA and the process can easily bec\>n* 
difficult , in a system where many, different mutation* *v 
polymorphisms occur. 

One approach to address this procedural difficulty i* t*> 
"reverse" the DNAs: attach the oligonucleotides to the 
nylon support and hybridize the amplified sample to (be 
membrane. Thus, in a single hybridization reaction, an «nii« 
series of sequences could be analyzed simultaneously. Tbc 
strategy we adopted was to immobilize the oligonucleotide! 
onto nylon filters by ultraviolet fixation. Exposure to UV 
light activates thymine bases in DNA, which then covalend) 
couple to the primary amines present in nylon (1 3). Ii seemed 
unlikely, however, that short oligonucleotides could be 
reedy attached to nylon in this manner and still retain ihcc 
ability to discriminate at the level of a single base-pa* 
mismatch. Consequently, the addition of a long deoxyrnV 
thymidine homopolymer tail, poly(dT), to the 3' end of the 
oligonucleotide appeared promising for several reason* 
First, the poly(dT) tail would be a larger target for O 
crosslinking and should preferentially react with the nyloo 
Second, dlTP is very readily incorporated onto the 3' crw» 
of oligonucleotides by terminal deoxyribonucleoiidyJiran* 
ferase and would permit the synthesis of very long tails < H' 
(Deoxyribothymidinc would also be the most efficient 
incorporated base if a purely synthetic route were chosen • 
Third, Collins and Hunsaker (IS) had shown thai the pre* 
ence of a poly(dA) homopolymer tail, used to iniroJo-e 
multiple *S labels, did not affect the function of sequel 
specific oligonucleotide probes. ^ 

We have used this technique to attach oligonucleouoc 
probes s pecific for the six major HLA-DQA DNA lypes «• 
and the eight most common Mediterranean / 3 " lh . a, * s 4 
mutations (4) to nylon filters. The target segment of the v*- 
sample to be tested (either HIA-DQA or 0-globm *» 
amplified by PCR with biotin-labeled primers ^ introduce 
nonradioactive tag. Hybridization of the amplified P r °? u , ajI> . 
the immobilized oligonucleotides and binding o f sire ip'^ 
horseradish peroxidase conjugate to the biotinylated pnm 
were performed simultaneously. Detection was aj£ ^ 
plished by a simple colorimetric reaction involving x * ^ 
zyraatic oxidation of a colorless chromogen that yielded 
color wherever hybridization occurred. 



MATERIALS AND METHODS 
Tailing of Oligonucleotides. Oligonucleotides were j 



$ nwsg % i wFi gt mcieotK es i m$ leo iut * v ^ 

sized on a DNA synthesizer (model 8700. f} ios * ar ^ s jdtf« 
0-cyanoeihyl N.Af-diisopropylphosphoramidite nuCIC T otl 4i 
(American Bionetics. Hay ward, CA) by using pr° l | f 
provided by the manufaclurer. Oligonucleotide l-^,? 
was tailed in 100 ^1 of 100 mM potassium cacodylale/- - f 
Tris-HO/1 mM CoCl 2 /0.2 mM dithiothreitol. pH 7 :li ?0 i 
with 5-160 nmol deoxyribonuclcosidc triphosphaie (d 

Abbreviation: PCR, polymerase chain reaction. 
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Accessing Genetic Information with 
High-Density DNA Arrays 

Mai* Chee, Robert Yang, Earl Huhbell, Anthony Bemo. 
Xiaohua C, Huang, David Stern, Jirn Winkler, David J. Lockhart 
Macdonald S. Morris, Stephen P. A. Fodor 

Rapi4 access to genetic inforniation is central to tho revolution taking place in molecular 
genetics. The sirntfltaneocis analysis of the entire human mitochondrial genome is 
acrtbed here. ONA arrays containing up to 1 35,000 probes complementary to the 1 6 6- 
krtcbase human mitochondrial genome ware generated by ligrrt-directad chemical syn- 
thesis. A two-coior labeling scheme was developed that allows simultaneous compar- 
ison of a polymorphic target to a reference DNAcr- RNA. Complete hybridisation pattams 
were revealed in a matter of minutes. Sequence polymorphisms were detected with 
single-base resolution and unprecedented efficiency. The methods describee are ge- 
neric and can be used to address a variety of questions in molecular genetics Including 
gene expression, genetic linkage, and genetic variability. 



A conceal cheme in modem genetics is the 
cclacioti between genetic variability and ph«- 
r«nype. To understand genetic variation and 
ire consequences on bitilogiLal funcctnti, an 
enormous effort in comparative sequence 
analysis will rveed to be carried ouc Convcn- 
ctonal nucleic acid 3c<$ucnccng ccdmolo^ica 
uw\uz w»e of analytical separacion techiMques 
to resolve sequence at the single nucleotide 
level (i t 2), However, cht cferc tequited 
increases Utneatly with the amount of se- 
quence. In contrast, biological system* read, 
scores artd modiry genetic infgrtqacion by mo- 
lecular tecognitbn (3). Because eacn DNA 
siwutd carries with it the capacity to recognize 
a uniquely coinplemencaiy sequence through 
base pairing, the process of recognition, oc 
hybrldtsaclon, is highly parallel, as every nu- 
cleotide in a large sequence can in principle 
be oueiied ac the saute time. Thus, hybrid- 
t2atinn can be u*ed to ^ficiencly unalyae 
Urge amounts of nucleotide sequence. In one 
proposal, sequences ace analyzed by hybrid- 
Uacton Co a sec of olteonucleoadea iepreaenr- 
ing all possible subsequences (4). A second 
approach, uvad hw % n hybrtdiiation co an 
array of otigpnucleotide probes designed to 
match specific sequences- In chis way che 
most informative subset of probes is <sed, 
Implementation of these concepts relics on 
recently developed conrtbinacoiial technolo- 
gies co generate any ordered anay of a large 
number of oligonucleotide probes (5). 
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The fundamentals of light-directed oli- 
gonucleotide array synches^ h*v< been de- 
scribed (J, 6). Any probe can be synthe- 
sited ai any discrete, specified location in 
the array, and any sec of probes composed of 
the four nucleotides can be synthesized in a 
maximum of W cycles, where N ia die 
length of che longest probe in che array. For 
example, the cmire sec of ~1Q ,J ZQ-ruicle- 
Ot'tde otigon\er probes, or any desired subset, 
can be synchediaed in only SO coupling cy- 
cles. The number of different probes chat 
can be synthescied is limited only by the 
physical siie ot che aeniy and che achievable 
lithographic resolution (7)> 

An array consisting of oligonucleotides 
cocnplemencary co subseqnenocs of a tfirgee 
sequence can be used co determine the iden- 
tity of a target sequence, measure its amount, 
and detect dicfevences between the cargec 
and a reference sequence. Many different 
array* can be designed for these purposes. 
One such design, termed a 4L tiled arr^y, is 
depicted in Fig. 1A. In each set of four 
probes, che perfect complement will hybiid- 
iie more strongly tlian mvsniacched probes. 
By this approach, a nucleic acid target of 
length L cm he scanned for mutations with 
a ciled airay containing 4L probes. For ex- 
ample* co query the id.S69 base pairs (bp) of 
huuian rai«ocliondtial DNA (mtDNA}, only 
6d,2?^ probes of che possible —10* 15-nu- 
cteocide oligomers need to be used- 

The use of a tiled array of probes to read a 
target sequence is illustrated in Fig. tC. A 
tiled aroay of l5-<\ucieotule oligomers vajried 
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Fta. 3. Human mlto* 

CttCrtdriai 96nomg on a 

chip. (A) An imagfi or tne 
array hybrids to t&6 
toofmitocnbrtdffel target 
HNA (L etrand). The 
t S.S69-bp map of tte 
genome is shown, and 
che H strand origin erf rep- 
icoffon (O^, located tn 
ihttucmind rogton. Is Incr- 
eased. (B) A portion of 
(he hybridization pattern 
mapnmed, m e&ch oat* 
umn thane ana five 
probes: A. 0. 0, T, and 4. 
from top to bottom. The 
A pro&e has a slngfe- 
baso deletion inate«ri cd a 
substitution and hence is 
24 instead of 25 bases in 
langtTu fta scale is indi- 
cated by tha Oar beneath 
the {mage, Although 
there & corriirfdratia sb- 
quence dopandent in- 
tensity variation, mott or 
ihB array can be redd di- 
rectly. Ihe mass was 
cotected at a resolution 
of -100 Dwete per probe 
oat. (C) rheabStyofthe 
array to detect and read 

sinale-bes* differences fn a sample is llustrsied. Tmyq different target sequences vuere hybrirfried 

in parafleJ to different cmps. The hybridization patterns are compared for four different positions in the 
sequence). Only the P 2 * 13 probes are shown. The top panel of each pair shows the hybridization or the mi3 
target. *hicn matched the chip P° seqyerob at friose poaiNona. The lower panel shows the pattern 
generated by a sample from a patient with Leber's hereditary optic noirapalhy (LMOMt- Thnoo known 
pathogenic mutations, and LHON 13708, are dearty detected. Forccmpanson. 

(he fourth pand in the set shows a reoyon around position 1 1 .778 mat is identical m both samoiea 
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provide the foundation for a powerful ge- 
neric analysis technology. The method 
can be used co eharaetcrire the spectrum 
of sequence variation in a population and 
can be applied to the analysis of many 
genes in parallel. In the case of human 
mrOHA., we simultaneously analyzed the 
control region, 13 protein coding gene* 
22 cUNA genes,, and 2 ribosomal RNA 
genes. The methods described here can be 

identify and sequence polymorphisms pro- 
vides a bifis for, genecic map pins* Tin-* 
specificity uf oligonucleotide hybridiza- 
tion and the scalability of the method 
suggests the possibility of a dedicated array 
that could be used to generate a Kigh- 
reiokrioti genetic map of an entire ge- 
nome in a single experiment. Likewise, 
che concepts and technique* described 
here have been u*ed co develop approach, 
es few mRNA identification and the Larue - 
scale, patallel uteasuremecic of egression 
leve] 3 (24). Thus. the sequence of a gene, 
its spectrum of change in the population* 
ica chromosomal locaxion, and Its dynam- 



ics of expression <a|l essential to a full 
understanding of funccton) can he deter- 
mined with high-density probe arrays- The 
challenge now is co synchesae and read 
probe atrcrys at even higher density Ru 
example, a 2 clu by 2 cm ari-sy. dynchestsed 
with probes occupying l*M4ti synthesis 
site* in a 4L tiling, could query the entire 
coding content of the human genome, 
estimated at 100,000 genes. 

HEt-EHcniuca mnu nut CO 
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The Future of Genetic Studies of 
Complex Human Diseases 



Neil Risch and Kathleen Merikangas 



Geneticists have ulmIc biibatanrikl progress In 
identifying die genetic basis of many human 
diseases, at least those with conspicuous deter- 
minants. These successes include Huntington's 
disease, Alzhe trier's disease, and some forms of 
breast cancer. However. , the detection of ge- 
netic factors for complex diseases— $uch as 
Echiwphreniii, bipolar disorder, and diabeces- - 
has been far more complicatui There have 
been numerous reports of genes or loci chat 
might underlie these disorders, but few of these 
findings have been replkuted. The modest na- 
ture of rlwi gene effects for these disorders likely 
explains eke contradictory and incoivdusive 
claims about their idenrificarion. Despite the 
sinall effects of such igenes, die nmynirude of 
iheir attributable risk v (rhe proportion of people 
affected ducto chain) may be large because they 
are quite frequent in the population, making 
d\eui of public health significance, 

Mas the genetic study of complex disorders 
readied its limits? The persistent lack of 
rcphcabiUty of these report* of linkage be- 
tween various loci mid complex diseases 
might imply that it has. We argue below that 
the method that has been used successfully 
(linkage analysis) to rind major genes has lim- 
ited power to detect genes of modest effect, 
but that a different approach (association 
studies) that utilhes candidate genes has far 
greater power, even if one needs to test every 
gene in the genome. Thus, the future of the 
genetics of complex d tseascs is likely to require 
large-scaLc testing by association analysis. 

How large dues a gene effect need to be in 
order to be detccrable by linkage analysis? 
We consider the following model: Suppose a 
disease susceptibility locus has two alleles A 
and a, with population frequencies p and q *» 
I - p, respectively. There are three geno- 
types: AA, Aa, and aa. We define genotypic 
relative risks (GRR, chv. increased chance 
that an individual with a particular genotype 
has d\e disease) as follows; Let the risk for 
Individuals of genotype Aa be y times yreacer 
than the risk for Individuate wlrh genotype 
aa, a CUB. of Y- We assume a aiulciplicarivc 
relation for two A alleles, so that the GRR 
for genotype AA is y 2 . The method nf Hnk- 

N. Risch Is in tha Department of Genetics. StantQfd 
University Sencd of Meocno. Stanford. G A 94305-5 1 20, 
USA. E-mafl: rlso^larmfkKtstantof dodu. K. Mcrfcangaa 
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Urvr. Yaia UirivorsHy StfiOcJ of Medicine. New Haven, 
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age analysis we haw chosen for this argu- 
ment is a popular current paradigm in wh tell 
pairs of siblings, both with the disease, are 
examined for sharing of alleles at multiple 
sites in the genome defined by generic mark- 
ers. The more often the affected siblings 
share the same allele at a particular site, the 
more likely che site is close to the disease 
gene. Using the formulas in (] ), we calculate 
the expected proportion Y of alleles shared by 
a pah- of affected sibling for die best possible 
case - that is, a closely linked marker locus 
(recorxuSination fraction 6 = 0) that is folly 
informative (heterozygosity =1) (2)— as 

2 + w (pY+<?)- 

If rhere is no linkage of a marker at a 
particular sire ro the disease, the siblings 
would be expected to share alleles 50% of the 
time; that is, Y would equal 0,5. Values of Y 
for various values' of p and y are given in die 
third column of the table. For an allele of 
moderate frequency (p ib 0.1 co 0.5) that con- 
fers a GRR (y) of fourfold or greater, there is a 
detectable deviation of V from che null value of 
0.5. On the other tiand, tor an allele conferring 
a GRR of2 or less, the expected marker* sliaring 
only marginally exceeds 50%, for any allele 
frequency (p). Thus, it Is clear that the use of 



linkage analysis for loci conferring GRR of 
about 2 or less will never allow identification 
because the number of families required 
(more tlian -2500) is not practically achiev- 
able. 

Although tests of linkage for genes of mod • 
est ertecr are of low powet; as shown by the 
above example, direct tests of association with 
a disease locus itself can still be quite strong. 
To illustrate this point, we use the oranjimjs- 
suWdisequilibrium nest of Sp ieiman et al . <3 ) . 
In this test, transmission of a particular, allele 
at a locus from heterozygous parents lo their 
alTecied offspring is examined. Under Mende • 
Uan inheritance, all alleles should have a 50% 
chance of being iransmitted to the next gen- 
eration. In contrast, if one of the alleles ib 
associated with disease risk, it will be trans- 
mitted more often than 50% of the time. 

For this approach, we do not need families 
with multiple affected sibling, but can focus 
just on single afTected individuals and dieir 
parents. For the same model given above, we 
can calculate the proportion of hetetoiygous 
parents as pq(y + l)/(py + q)<4). Similarly, 
die prohability for a heteruaytfotu parent to 
transmit the high risk A allele is just yf( 1 + «y). 
Association tests can also be performed for 
pans of afTected siblings. When the locus is 
associated with disease, rhe transmission excess 
over 50% is the same as for single spring, but 
the probability of parental hercrotygosicy b in- 
CPeasedat luw values of p; for higher values of p, 
the probability of parental lieterozyijosity is db- 
creascd. The formula for parental hecerozyyoti- 
iry for an afTected pair of siblings for die same 
genetic model as used in the first example is 

(v + 1 ) 2 



2{py + ^) z + j^(V-D 2 
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On the right side of rhe rable, we present 
the proportion of hcraozyj^uss parents (Hec) 
and chc probability of tninsmbsion of the A 
i aJlclc from a heLeroz y&vjs parent ro an af- 
fected child (P(tr-A)j for the same values of 
GRR considered khove for chc example of 
linkage analysis. The deviation from chc null 
hypothesis of 509b iraosnusaion from het- 
erozygous parents U subscanualry Abater 
than dtc excess allele sharing diat u> round by 
linkage analysis in sibling pair*. Thia dispor 
ity between the muthods as particulurly true 
for lower values of y (chat is, with lower rela- 
tive risk). For example, fox J = 1,5, allele 
sharing is at njost 51%, while the A allele is 
transmitted 60% of rru? time from heterozy- 
gous parents. 

In this respect then, association studies 
seem to ho of greater power than linkage 
studies- But of course, the limitation of as- 
sociation studies is chut the actual gene or 
genes involved in die disease muse be tenta- 
tively identified before the test can be per- 
fojmed In foci, the actual rwlymnrphisra 
within the gene (or at least a polymorphism in 
strong disequilibrium) must be available. 
However, we show that dils requirement Is 
o/tly daunting bemuse of limitations imposed 
by current terJinologicat capabilities, not be- 
cause sufficient families with the disease are 
noi available Or the .stuibiticul power is inad- 
equate (5). For example, imagine the time 
when all human genes (say 100,000 in total) 
have been found and that simple, diailellc 
polynwrpruKnut in these genes have been 
identified. Assume that five such diallelic 
polyittorphisms have been identified within 
each tfene, so chat a total of 10 x 10 s * 10 6 
alleles need ro be tesred. The statistical prob- 
lem is thatthe large nuraberof tests that need 
to be made leads to an inflation ol* the type I 
error probability. For a linkage test with pairs 
of affected siblings, we u<>e a lod score (loga- 
rithm of the odds ratio for linkage) criterion 
of 3-0, which asymptotically corresponds to a 
type 1 error probability a of abTMlt W A . In a 
linkage genome screen with 500 markers, 
this significance level gives a probability 
greater d\an 95% of no false positives. The 
equivalent rait* positive rate for 1,000.000 
independent association tests can be ob* 
tained with a significance level a ** 5 * 10"^ 
We illustrate the power of linage versus 
association tests at different signif icance lev- 
els by determining the sample .size N (num- 
ber of families) necessary to obtain 60% 
power (die. probability of injecting the null 
hypothesis when it is false) (6) (see table). 
With a linkage approach and a disease gene 
with a CRR of 4 or greater, die number of 
affected sibling pairs necessary to detect link- 
age U realistic (185 or ZP7), provided the 
allele frequency p is between 5 and 75%» tea 
a gene with a GRR of 1 or less, however, the 
sample sizes are genetaUy beyond reach (well 



over 2000), precluding their identification 
by this approach In contrast, the required 
sample size for the association test, even al- 
lowing for the jniaJler significance level, is 
vastly less than for linkage, especially for af- 
fected sibling pair families when die value of 
p is small. Even lor a CRR of 1.5, die sample 
sises arc generally less than 1000, well within 
reason. 

Thus, die primary limitation of gtnoinc* 
wide association tests is not a .statistical one 
Kit n technological one. A large number of 
genes (up to 100,000) and tolyittorphisrns 
(preferentially ones that create alterations in 
derived proteins or dieir expression) must first 
be identified, and an extremely large number 
of such polymorphisms will need to be tested. 
Although testing such a large number of poly- 
morph isms on several hundred, or even a 
thousand families, might currently seem im- 
plausible in scope, more emcient methods of 
screening a large number of polyuiorpllisnis 
(for example, sample pooling) may be pos- 
sible, fonhcrmore, die number nf tests we 
have used as die basis for our calculations 
( 1 ,000,000) is likely to be for larger than nec- 
essary if one allows for linkage disequilibrium, 
which could suhsmntially reduce the required 
number of markers and families needed for 
initial screening. 

Some of the important loci for complex 
diseases wilt undoubtedly be found by link- 
age analysis. However, the limitations to de- 
tecting many of the remaining genes by link- 
age studies can be overcome; numerous ge- 
netic effects too weak to identify by linkage 
can be detected by genomic associatit m stud- 
ies. Fortunately, the samples currently col- 
lected for linkage studies (for example, af- 
fected pairs of siblings and their parents) can 

uho be used for Such association studies. 
'fhuH. investigators should preserve their 
samples for ruciire large-scale testing. 

The huaian genome project can have 
marc than one reward. In addition to se- 
quencing rhe entire human genome, it can 
lead to identification of polymorphisms for 
all the genes in the human genome and the 
diseases to which ihey contribute. It is a 
charge to the molecular technologists to de- 
velop the tools to meet this challenge and 
provide the information necessary co identify 
the genetic tjasis of complex human diseases. 

References and Notes 

1. N. Risch, .4//>. J. Hunt Gvftut. 40, 1 (1987); toia. 
46,229(1690). 

2. Romtha larrnulas in [T). we tovo \q - i + u.5V^ 
K z end 1 + (0.5 V A ♦ 0.2b V Q )tK z . wnwu K - 

^ ^ pqY + c^= (er ♦ 9^ v a - ^ tar 

+ qf, end V 0 = ffvfici - 1 f, Hence. = 1 r w 
and K% - (1 + Q^wf; wriBre w = pq\y- ifi Th« 
proportion of eiiolec afiarwl i& Qhan hy Y - \ - 
0^2-1 - 2q, wnare z 1 ai"Kl 2q aio ir*r p>orx<t>HiUasol 
the sib pair 4harinrj i mo 0 ai&a&Q alleles ibd. ru- 
4peei»veiy. Krom ( J), r© = 0.2bVXe artfl Zy •* O SW 
X3. Thus, after corti*ai9«brs. r*- 1 -O.^Xq + 1j/ 

SCIENCF. ■ VOL. 273 • iJ SLTTF-MDE* 1096 



^=*0 = iv)A2 * 

a R Spiotrrtan. R. £, MuGInnio. Vv. J. Cvwns, Am. J. 
Hunt. 6$not, 62. 508 (1933). 

4. By Bayco; tfwyym, trie probamily 01 a pixixt ol an 
ftff*cldcJ c*-flw> owing tet&rotyVQiei w yiv^i Dy 
P(HoOAfl child) - PjHtfWtf CWWHeiVHAII Child J 
= 2p40-3p(r + 7» * o-S o(r ♦ i Viw + qj 2 = PQ[t + 

£. $. Under arid N. J. ScvioiX. Science 265, 2037 

6. Concidaf a vel u! M Indbpondent. wot ilicalry di»- 
iH bated random v&riablca 6) of deui-ctc value. Orv 
a-* null itypoUiOub H 0 , Bacmf^ - 0 urid 
Varta) ^ 1, UnrJc; ihc oJlomative hyyoUieeiij H } , 
lei t"(eji m u and Var(Sp - e 2 . For a aarnplo of aiac 
M, lei T £S|^ Then under tfo r«Ju) hua mean 
0 arid variance t while unde* r?,. u haa moan 
n suxi variance o 2 . Wo aeeume mat Tin app/oxi- 
mataly narmiity dlLtrtauivd 00th un dur Hq and ^ . 
Thvn ihv dampb eise rVf roQU'ceC u> ubmin u power 
ot 1 - p lor a sionilitt* tee lovd a w civon by 

W^^-oZ,.^ 2 (1) 

Fcv eiich affected sib pair, we scoro iha rtambcr of 
aUdey shared Ibd from eacn of %N parents Oelu m> 
Sj 7 if an afJeie ia aha^d frqrn the /ih parent and 
B j = -i W unarterad, Under (he nufl hypotheai9 of 
no Bnkase. ^-^1)- 0 5. so E(fi,) » 

0 and var(fy - 1 . For (t*i uan^ilc n«oda| i described 
above with genotyplc rei«Uv« risks of *f, % and t. 
allele etisjmg by artecltd sOkj iu inilepcjndent tor 
tne two parents; thus, can consider snaring of 
ailetaA oj^ parent at 0 xirne. Thus, for affected aib 
pate a&}un^irj 6 » 0 and no linkage fltaeqafllbrium, 
!hc lorrnula ia 



N 



p=2/-1 

a 2 = 4Vt1- V) 
v 1 -t v/ 



Ptf (7-T 
•IPT+Q) 2 



-2^ = .3 72 (correa ponding to a = 10""*). and Zj u 
= -0.81 <correaponding to 1 •• 0 a- 0.60).. For oh 
jas&ociarion teet u$ing rhe tranerojacion/d«aquiU- 
bnum toai, wvhh the diaeace iocua or a ntarby io- 
cua In compiete digaqullIbTium, the raimbef (AT) of 
lamllJes with orfacted aingJetpns required for 80% 
power tc aJco caJcu^5ted from formula 1. For this 
cay*. *e *jeore the rxtfrtse/ of \w&rtt&\aw. of atola 
A Uorn n^ertteyQOUS paranl$ Ul bo ih* probabll- 
ily a parcnl ii hOlOrO^yoewe UtHtflr ma aiiftmatlva 
hy potheais, namely, h = pqfr + iy(py + o). Then de» 
tino flj if* 5 If the parent is nelerozyoa^s and al- 
Icift A in trunc'qfetArt: A ^ 0 if ina |>arenl is homozy- 
Uoua; and a » -tr^ « the pevent *; naiDroxygoua 
and transmits ajldu a. Under if tc nail riypodieBis. 
qSj) - 0 and Vurt«^ ■ 1. Under trie {UtBmaave hy- 

pOthCOij, H " i" ^(T ^WY '»■ 1) And O 2 cr 

VarfSi) = 1 - hfy - 'ff[y ■>- 1>*. In ihis ease, trier* are 
two parents pes tamiry end they act Indcpt-ndcnH/, 
so the required number (AO of temiaea ia given by 
half of formula 1 where n and <r o^e»VBbove. 
rteio. / 0 - 5.33 (corresponding to o; = 5 x 10 -8 }. Por 
the earne tcer but with affectad sib pulrs IiUitead of 
singtoioris, ma numtiar or larWliii; lequirad Js ghren 
by hjjn ot torrnula 1 (transmcssionu Jrom two patient^ 
to two cnildrGn) with thfe same foirvnjiaa for p. and cr 
as lor sinpJsbori fantl&es but now using the hatercssy- 
gote frequency for pefeite of aitedad fito pairs. Ua- 
109 the afceve formulDii, v« cm oCitouiate tamplo 
saw ror the throo sajoy dcaigna. 

27 October f 99* accepted 6 Juno 1906. 
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Document for use in Amendment/Responses 9/13/05 and 1 1/05 for application 10/037, 718 1 
Applicants MCGINNIS ET AL 

Illustration of an Example Set/Subset N-coverinq using a CL-F map: Subsets 
of bi-allelic covering markers N-cover an entire, large rectangular CL-F region bounded by a 
chromosome or chromosomal subregion of interest in the chromosomal location (CL) dimension 
and bounded by the range 0 to 0.5 in the (least common allele) frequency (F) dimension. 

Subsets of bi-allelic covering markers are chosen whereby each of the 10 smaller rectangular segment-subranges 
designated A, B, C, D or E (on the left and right) contains two or more covering markers that belong to the same 
subset. These 10 overlapping segment-subranges completely cover the entire, large rectangular CL-F region 
(arrows pointing to the top boundary and right boundary). Each of these 10 segment-subranges is less than or equal 
to L 2 in length and equal to 0.15 in width. So each point in the entire large rectangular region is within the two- 
dimensional (CL-F) distance [L 2 , 0.15] of two or more covering markers. That is, the entire large rectangular 
region is N-covered to within [L 2y 0.15] by the bi-allelic covering markers, wherein N £ 2. 




Segment 1 
of length Li 



\ Segment 2 
of length L2 



Chromosome or chromosomal subregion of interest 
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